Lecture 07
Fundamental algorithms for scientific computing in Python
| Subpackage | Description | Subpackage | Description | |
|---|---|---|---|---|
cluster |
Clustering algorithms | odr |
Orthogonal distance regression | |
constants |
Physical and mathematical constants | optimize |
Optimization and root-finding routines | |
fftpack |
Fast Fourier Transform routines | signal |
Signal processing | |
integrate |
Integration and ordinary differential equation solvers | sparse |
Sparse matrices and associated routines | |
interpolate |
Interpolation and smoothing splines | spatial |
Spatial data structures and algorithms | |
io |
Input and Output | special |
Special functions | |
linalg |
Linear algebra | stats |
Statistical distributions and functions | |
ndimage |
N-dimensional image processing |
The mean (non-squared) Euclidean distance between the observations passed and the centroids generated.
For general numeric integration in 1D we use scipy.integrate.quad(), which takes as arguments the function to be integrated and the lower and upper bounds of integration.
The PDF for a normal distribution is given by,
\[ f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \exp\left(-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2 \right) \]
We can check that we’ve implemented a valid pdf by integrating the pdf from \(-\inf\) to \(\inf\),
\[ f(x) = \begin{cases} \frac{c}{\sigma \sqrt{2 \pi}} \exp\left(-\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2 \right), & \text{for } a \leq x \leq b \\ 0, & \text{otherwise.} \\ \end{cases} \]
array([0.3989])
array([0.])
array([0.])
array([0. , 0.242 , 0.3989, 0.242 , 0. ])
def trunc_norm_pdf(x, μ=0, σ=1, a=-np.inf, b=np.inf):
if (b < a):
raise ValueError("b must be greater than a")
x = np.asarray(x).reshape(-1)
nc = 1 / quad(lambda x: norm_pdf(x, μ, σ), a, b)[0]
full_pdf = nc * (1/(σ * np.sqrt(2*np.pi))) * np.exp(-0.5 * ((x - μ)/σ)**2)
full_pdf[(x < a) | (x > b)] = 0
return full_pdf\[ f(\bf{x}) = \det{(2\pi\Sigma)}^{-1/2} \exp{\left(-\frac{1}{2} (\bf{x}-\mu)^T \Sigma^{-1}(\bf{x}-\mu) \right)} \]
are supported by dblquad() and tplquad() respectively (see nquad() for higher dimensions)
from scipy.integrate import dblquad, tplquad
dblquad(lambda y, x: mv_norm([x,y], [0,0], np.identity(2)),
a=-np.inf, b=np.inf,
gfun=lambda x: -np.inf, hfun=lambda x: np.inf)
(1.0000000000000322, 1.315012783660615e-08)
tplquad(lambda z, y, x: mv_norm([x,y,z], [0,0,0], np.identity(3)),
a=0, b=np.inf,
gfun=lambda x: 0, hfun=lambda x: np.inf,
qfun=lambda x,y: 0, rfun=lambda x,y: np.inf)(0.12500000000036066, 1.4697203688867502e-08)
brent
=====
Options
-------
maxiter : int
Maximum number of iterations to perform.
xtol : float
Relative error in solution `xopt` acceptable for convergence.
disp: int, optional
If non-zero, print messages.
0 : no message printing.
1 : non-convergence notification messages only.
2 : print a message on convergence too.
3 : print iteration results.
Notes
-----
Uses inverse parabolic interpolation when possible to speed up
convergence of golden section method.
bounded
=======
Options
-------
maxiter : int
Maximum number of iterations to perform.
disp: int, optional
If non-zero, print messages.
0 : no message printing.
1 : non-convergence notification messages only.
2 : print a message on convergence too.
3 : print iteration results.
xatol : float
Absolute error in solution `xopt` acceptable for convergence.
golden
======
Options
-------
xtol : float
Relative error in solution `xopt` acceptable for convergence.
maxiter : int
Maximum number of iterations to perform.
disp: int, optional
If non-zero, print messages.
0 : no message printing.
1 : non-convergence notification messages only.
2 : print a message on convergence too.
3 : print iteration results.
message:
Optimization terminated successfully;
The returned value satisfies the termination criteria
(using xtol = 1.48e-08 )
success: True
fun: -1.0
x: 5.000000000618556
nit: 8
nfev: 11
\[ f(x,y) = (1-x)^2 + 100(y-x^2)^2 \]
array([[ 0.9511, 1.7504],
[ 0.9079, 0.744 ],
[ 0.3058, -0.1628],
[ 1.0924, 1.5028],
[ 0.275 , -0.9601],
[-2.5332, -1.9207],
[ 0.4351, 1.0057],
[ 0.4622, 0.4238],
[-0.351 , -1.1458],
[-0.9887, -0.1039]])
array([[ -1.5692, 7.9474, -0.3551, -0.1892, 1.9886, 0.8318, 0.6448, -2.9865, -0.3209, -0.4499, 1.1593, 0.5865],
[ -1.1753, -27.9746, -0.4322, -0.5429, -0.6269, 1.1644, 1.4115, 3.9278, 0.3184, 0.673 , -0.0762, 0.7212]])
Implements classes for 104 continuous and 19 discrete distributions,
rvs - Random Variates
pdf - Probability Density Function
cdf - Cumulative Distribution Function
sf - Survival Function (1-CDF)
ppf - Percent Point Function (Inverse of CDF)
isf - Inverse Survival Function (Inverse of SF)
stats - Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis
moment - non-central moments of the distribution
array([-0.1844, -0.3056, -0.6316, 2.0088, 1.1439])
array([1., 1., 1., 0.])
2.5
2.0
(1.0, 1.0)
(0.0, 1.0, 0.0, 0.0)
Model parameters can be passed to any of the methods directory, or a distribution can be constructed using a specific set of parameters, which is known as freezing.
Maximum likelihood estimation is possible via the fit() method,
(2.5314811643075235, 1.946132398754459)
(2.5314811643075235, 1.946132398754459)